Data Analysis on Aspiring Minds Employability OutcomesData

By- Suraj Honkamble

The Dataset description

The dataset was released by Aspiring Minds from the Aspiring Mind Employment Outcome 2015 (AMEO). The study is primarily limited only to students with engineering disciplines. The dataset contains the employment outcomes of engineering graduates as dependent variables (Salary, Job Titles, and Job Locations) along with the standardized scores from three different areas – cognitive skills, technical skills and personality skills. The dataset also contains demographic features. The dataset contains around 40 independent variables and 4000 data points. The independent variables are both continuous and categorical in nature. The dataset contains a unique identifier for each candidate.

Research Questions

  1. Times of India article dated Jan 18, 2019 states that “After doing your Computer Science Engineering if you take up jobs as a Programming Analyst, Software Engineer, Hardware Engineer and Associate Engineer you can earn up to 2.5-3 lakhs as a fresh graduate.” Test this claim with the data given to you.
  2. Is there a relationship between gender and specialisation? (i.e. Does the preference of Specialisation depend on the Gender?)

Import required libraries

Load and read the dataset

Column Information

Missing values

No Missing values.

Univariate Analysis

Drop the Unnamed :0 column

Analyze ID column

ID's are unique for each row or we can say for each employee.

Analyze Salary column

Minimum, Maximum, Average and Median Salary

Mean and Median for Salary column is nearly equal, so might be less posibility that this columns contains any outlier.

Check the distribution of data to detect outliers

There are many outliers present in this column and data is right skewed, Lets take log transormation and check whether the distribution will change to normal or not.

Now the Distribution somewhat looks similar to normal, and we are get rid of outliers which are laying far away from the group of data.

Analyze Date of Joining column

When the first and last requiretemt happen?

First Employee joined the Company on 1st June 1991 and last employee on 1st December 2015.

Recruitment by Year

  1. First recruitment is in 1991 and the last is in 2015
  2. More EMployees recruited in 2014.
  3. No single employee is recruited between 1991 to 2004.

Analyze Date of Leaving column

Date of Leaving should be datetime column. If the employeed still working then data point "Present". This is 2015 Employability data so the final date here should be 01-12-2015. Its better to replace "Present" with "01-01-2016". Then convert it to datetime column.

How many employees still working in this company?

1875 Employees still working in this company.

In each year How many employees left?

Every calender year the count of employees who left the company is increasing.

Analyze Designation column

Many Designations are repeated with either in short form or with liitle spelling difference.

Now Capitalize the Designations so it look Proffesional Title

In which Job role most employees are working

  1. There are 405 unique designations.
  2. Out of 3998 employees, 500+ employees working as Software Engineer.
  3. Most of the employees are only Related to Software Development field.

Missing values JOb City Column

How many employees working as Business Analyst, Data Analyst, Python Developer and Data Scientist.

Analyze Job cities

Many Job Cities are repeated with some spelling change, some white spaces at beginning or at end, Some are in upper case and some are lower case, some are with old city names(the city name changed now.). Lets apply string methods to clean the names.

Now we left with 230 Unique cities but still we can see dupplicate cities with little spelling mistake, and few cities with their old name. Lets change these city name one by one 🥱🥱

After cleaning the city names, we left with 195 Job Cities.

Convert these name to Title

In which City most employees are working

  1. Most EMployees working in Bangalore.
  2. Out of total nearly 20% of employees working in Bangalore.%
  3. Bangalore, Noida, Hyderabad, Pune, Chennai are the top 5 Cities where most employees work.

Analyze Gender column

  1. Count of Male employees is more than Female employees.
  2. Nearly 80% Employees are Males.

Analyze Date Of Birth

Who is the Oldest and Youngest employee

To Answer this question we need present date and as per this dataset the present date can be assumed as 01-01-2016. Just Substract Date of Borth from Present Date i.e 01-01-2016. After Substracing We get TImeDelta result. By using datetime compoment we get the days from date of birth to now, by dividing this by 365 we get the Age of EMployee.

Now get the first component of timedelta object which is days and divide it by 365 to get year.

  1. Oldest Employees Date of Birth is 30-10-1977, Youngest EMployees Date of Birth is 27-05-1997.
  2. Oldest EMployees Age is 38 in the 2016 and Youngest Employees age is 18 in 2016.

Analyze Age column

What is the age range in which most of the employees are belongs to?

  1. Most of the employees are in the Age Range of 22 to 27.
  2. Nearly 70% of total employees are of age 23,24 & 25.

Analyze 10th Percentage column

Check the distribution of Percentage

  1. Minumum 10th Class Percentage is 43%, Maximum 10th Class Percentage is 97.76% & Average 10th Class Percentage is 77.93%.
  2. Most of the Employees has 10th Percetage between 70% to 90%.
  3. There are few employees who's 10th Percentage is less than 50%.

Analyze 10th Board column

10th Education boards are repeating with full form and some are repeating with abbreaviation. Lets clean this column. Here I am changing the Any state board name to state board.

Check the count of Employees by 10th Board

  1. This is Expected becuase lots of student went to their respective State boards only.
  2. Nearly 60% Employees studied in Respective State Boards.
  3. Nearly 35-38% EMployees studied in 'CBSE Board.
  4. 2-3% studied in ICSE Board.

Analyze 12th Passing year

  1. Most of Employees Passed out 12th in year 2009.
  2. Count of 12th Pass employees is more in 2006 to 2010.

Analyze 12th Percentage column

Check the distribution of Percentage

  1. Minumum 12th Class Percentage is 40%, Maximum 10th Class Percentage is 98.7% & Average 10th Class Percentage is 78.46%.
  2. Most of the Employees has 12th Percetage between 60% to 90%.
  3. There are few employees who's 12th Percentage is less than 50%. and there is one employee whos percentage is 40%

Analyze 12th Board column

Lets rename every respective state board to "State Board" and rest are "CBSE","ICSE".

  1. Nearly 60% Employees studied in Respective State Boards.
  2. Nearly 38% EMployees studied in 'CBSE Board.
  3. 1-2% studied in ICSE Board.

Analyze College Tier

How many employees belongs to each College Tier

90% Employeess are from 2nd College Tier.

Analyze Degree column

Count of employees by Degree

  1. 90% Employees are from Engineering discipline (B.E/B.Tech & ME/M.Tech)
  2. 5-6% are from MCA
  3. Only 2 employees are from MSc.

Analyze Specialization or Branch column

Many Branch Names repeated here, lets change them to Proper format.

Count of Employees by Specialization

  1. Most Employees belongs to Computer Science, Electronics & Communication, Information Technology, Mechanical, Electrical, Electronics & Telecommunication, Electronics & Instrumentation and Computer Application.
  2. CS,EC,IT/IS are more likely to employed in Software Field.

Anlyze College GPA columns

In some Universities, GPA is calculated on 10th Scale, Lets convert these value to 100th Scale.

  1. Minimum College GPA is 49%, Maximum is 99.93% and Average GPA is 71.70%.
  2. Most of the employees have GPA between 60%-90%.
  3. There are No extreame Outliers in the GPA column.

Analyze College City Tier: Location in which the college locted in City

Most of the Colleges Located in the 0th Tier of the city.

Analyze College State

  1. Highest Number of Employees studied from Uttar Pradesh College's.
  2. Large number of Employees are belongs to College of State Uttar Pradesh, Karnataka, Tamil Nadu, Telangana, Maharastra.
  3. Least number of EMployees are belongs to College of State Assam, Goa, Sikkim, Meghalaya and Unioin Teretories.

Analyze Bachelors Degree Graduation Year of Employees

There is one unique entry in Graduation Year which "0", lets see what it is.

This Employees 12th Passing year is 2010 and he studied 'Engineering' so my logically his graduation year must be 2014.

Highest Number of Employees are completed their Bachelors Degree in year 2013 and least in 2007.

Analyze Scores in AMCAT English,Logical,Quantitative and Domain section

Distribution of Scores

  1. Some of employees scores very well in exams, Some are extra-ordiary.
  2. In the Domain Score, there are some scores=-1, this means either they did not attended the Domain Exam or they do not have any specific doamin.

Minimum, Maximum Scores in All Subject.

Average Score in all the Subjects is in range 500 to 510.

Analyze Score of Computer Programming, Electronics And Semiconductor, Computer Science, Mechanical Engg, ElectricalEngg, TelecomEngg and CivilEngg.

Distribution of Core Domain Scores

There are some Employees for who the Domain is Different, Scores are allotted for those columns where the employee has taken the exam. for other that the specific doamin column the scores=-1, lets see the distribution for only the score>=0

  1. In the Computer Programming Exam Job Swwkers have score above 800 Marks, but rest of the Exams the Maximum score is very less
  2. We can say that there is more Competetion in the field of Computer Programming.

Average, Minimum and Maximum scores is all Specific Domain Exams

Bivariate Analysis

How Many Years an Employee worked in a company.

There is Something wrong In eaither DOJ or DOL column, Lets fing out

38 rows have DOJ greater than the DOL.

DOL Must be greater than the DOJ. But here inverse happened. Lets Impute with the "Present Date" which we were created earlier("01-01-2016").

Number of employees with Work experience more than 5 Years

  1. A single Employee worked Maximum of 24 Years in a company.
  2. Many Employees worked0 Years in a company.
  3. Average Work Experience in a company is 1 year and 8 Months.
  4. Almost 95% of employees has worked between 0-5 Years in a company.
  5. Only 59 Employees has 5+ Years Experience.

Relationship between Years of Experience and Salary

Lets Drop some Extrame Salaries and Years of Experience and then CHeck the relationship.

Removing Salaries above 2.5 Lakks as there are only 6 Datapoints and Years of Experience above 7.

There is a Positive Relationship Between Years of Experience & Salary, As Experience increases Salary increases.

Which are the top 15 high paying designation?

Junior Manager, Senior Developer and Data Scientistthese are the Job Designation with high average Salary.

Which Job Designations have Lower PayScale?

Secreary, Trainee Software Developer, Visiting Faculty are have low average salary.

Gender wise Salary estimation

Average Salary for Male and Female Employee is almost Equal.

In Which Job Designation Female employment is higher?

Most Females working as Software Engineer, System Engineer, Software Developer, Programmer Analyst.

In Which Job Designation Male employment is higher?

  1. Most Males working as Software Engineer, System Engineer, Software Developer, Programmer Analyst.
  2. Both Male Female employees preffer Software field.

Is there any Relation between Graduation GPA and Salary?

Here we need to drop some of high salaries so we can see the clear relation

There is a Positive relation between College GPA and Salary, but the relation is not stronger.

Relationship between Employebility Text Exam Score and Salary

Higher is the Test Score higher the chances of getting high paying jobs.

Gender wise Education performanc.

newplot%20%282%29.png

In 10th, 12th and even in Graduation the performance of Female student is better than Male😮😮.

Which Graduation Specialization has Highest average GPA?

Embeded Systems Technology has highest average COllege GPA.

Average Salary by Specialization

  1. Polymer Technology has Highest salary. Becuase there is only Entry for this Specialization and the salary fot that employee is 7lakh.
  2. Industrical & Production Engineering, Civil ENgineering, Chemical enginnering specialization has highest average salary.
  3. Difference in the Average salary between Polymer Technology specialization and other specialization is huge.

Which State Students have Highest College GPA

  1. Jharkhand, West Bengal, Bihar, Tamilnadu and Orissa state student has highest College/Graduation GPA.
  2. Maharashtra, Meghalaya, Goa, Sikkim and Rajsthan state student has lowest College/Graduation GPA.

Research Questions

  1. Times of India article dated Jan 18, 2019 states that “After doing your Computer Science Engineering if you take up jobs as a Programming Analyst, Software Engineer, Hardware Engineer and Associate Engineer you can earn up to 2.5-3 lakhs as a fresh graduate.” Test this claim with the data given to you.
  2. Is there a relationship between gender and specialisation? (i.e. Does the preference of Specialisation depend on the Gender?)

Que-1: Here the Claim is: “After doing your Computer Science Engineering if you take up jobs as a Programming Analyst, Software Engineer, Hardware Engineer and Associate Engineer you can earn up to 2.5-3 lakhs as a fresh graduate.”`

Step-1:

Create DataFrame of Employee who Graduated in Computer Science Engineering descipline/Specialization.

There 1354 Employees Studies in Computer Science Engineering

Step-2

Create DataFrame of Computer Science Engineering graduates working in Programming Analyst, Software Engineer, Hardware Engineer and Associate Engineer these job Roles/Designations.

283 Employees from Computer Science Background working as related to Programming Analyst or Software Engineer or Hardware Engineer and Associate Engineer.

Step-3

Check whether the Average Salary for these Job Designation Lies Between 2.5 to 3 Lakh or Not.

There are some extreame data points in salary its better to take median

Clearly See that from the Average Salary that "Computer Science Engineering" Students Who is working either in "Programming Analyst or Software Engineer or Hardware Engineer and Associate Engineer" Designation get starting average Salary above "3.2 Lakhs INR". Hence the "Claim of Times Of India is Correct".

Que-2 : Is there a relationship between gender and specialisation? (i.e. Does the preference of Specialisation depend on the Gender?)

Most common Specialization in Female & Male

newplot.png

newplot%20%281%29.png

Top 5 Most Common Specialization for Female are "Computer Science, Electronics & Communinication, Information Technology, Computer Application and Electrical Engineering.

Top 5 Most Common Specialization for Male are "Computer Science, Electronics & Communinication, Information Technology, Electrical Engineering & Mechanical Engineering.

Both Male and Female have same Interest of studies so we cannot say that Specialization Depends on Gender.